Term-based Identification of Sentences for Text Summarisation
نویسندگان
چکیده
The present paper describes a methodology for automatic text summarisation of Greek texts which combines terminology extraction and sentence spotting. Since generating abstracts has proven a hard NLP task of questionable effectiveness, the paper focuses on the production of a special kind of abstracts, called extracts: sets of sentences taken from the original text. These sentences are selected on the basis of the amount of information they carry about the subject content. The proposed, corpus-based and statistical approach exploits several heuristics to determine the summary-worthiness of sentences. It actually uses statistical occurrences of terms (TF· IDF formula) and several cue phrases to calculate sentence weights and then extract the top scoring sentences which form the extract.
منابع مشابه
Unsupervised Biographical Event Extraction Using Wikipedia Traffic
Biographical summarisation can provide succinct and meaningful answers to the question “Who is X?”. Current supervised summarisation approaches extract sentences from documents using features from textual context. In this paper, we explore a novel approach to biographical summarisation, by extracting important sentences from an entity’s Wikipedia page based on internet traffic to the page over ...
متن کاملThe influence of personal pronouns for automatic summarisation of scientific articles
In automatic summarisation, statistical methods based on tokens’ frequency are commonly used in combination with other methods or on their own to extract important sentences from a text. Quite often researchers justify the relatively poor performance of these statistical methods by the fact that they do not consider the anaphoric relations between words. In this paper, we perform a comprehensiv...
متن کاملImpact of Citing Papers for Summarisation of Clinical Documents
In this paper we show that information from citing papers can help perform extractive summarisation of medical publications, especially when the amount of text available for development is limited. We used the data of the TAC 2014 biomedical summarisation task. We report several methods to find the reference paper sentences that best match the citation text from the citing papers (“citances”). ...
متن کاملA Rhetorical Status Classifier For Legal Text Summarisation
We describe a classifier which determines the rhetorical status of sentences in texts from a corpus of judgments of the UK House of Lords. Our summarisation system is based on the work of Teufel and Moens where sentences are classified for rhetorical status to aid sentence selection. We experiment with a variety of linguistic features with results comparable to Teufel and Moens, thereby demonst...
متن کاملAutomatic Annotation of Corpora for Text Summarisation: A Comparative Study
This paper presents two methods which automatically produce annotated corpora for text summarisation on the basis of human produced abstracts. Both methods identify a set of sentences from the document which conveys the information in the human produced abstract best. The first method relies on a greedy algorithm, whilst the second one uses a genetic algorithm. The methods allow to specify the ...
متن کامل